AITopics | heterogeneous dataset

Collaborating Authors

heterogeneous dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LAPO: Latent-VariableAdvantage-WeightedPolicy OptimizationforOfflineReinforcementLearning

Neural Information Processing SystemsFeb-12-2026, 18:26:51 GMT

But in practice, it requires querying the behavior policy which is unknown, and using an erroneous approximation of the behavior policy can negatively affect the performance ([39]).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Personalized Dictionary Learning for Heterogeneous Datasets

Neural Information Processing SystemsDec-26-2025, 11:10:34 GMT

We introduce a relevant yet challenging problem named Personalized Dictionary Learning (PerDL), where the goal is to learn sparse linear representations from heterogeneous datasets that share some commonality. In PerDL, we model each dataset's shared and unique features as global and local dictionaries. Challenges for PerDL not only are inherited from classical dictionary learning(DL), but also arise due to the unknown nature of the shared and unique features. In this paper, we rigorously formulate this problem and provide conditions under which the global and local dictionaries can be provably disentangled. Under these conditions, we provide a meta-algorithm called Personalized Matching and Averaging (PerMA) that can recover both global and local dictionaries from heterogeneous datasets. PerMA is highly efficient; it converges to the ground truth at a linear rate under suitable conditions. Moreover, it automatically borrows strength from strong learners to improve the prediction of weak learners. As a general framework for extracting global and local dictionaries, we show the application of PerDL in different learning tasks, such as training with imbalanced datasets and video surveillance.

heterogeneous dataset, name change, personalized dictionary learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

H-nobs: Achieving Certified Fairness and Robustness in Distributed Learning on Heterogeneous Datasets

Neural Information Processing SystemsDec-25-2025, 21:40:16 GMT

certified fairness and robustness, fairness and robustness, h-nob, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

An Efficient Privacy-preserving Intrusion Detection Scheme for UAV Swarm Networks

Gharami, Kanchon, Moni, Shafika Showkat

arXiv.org Artificial IntelligenceDec-1-2025

The rapid proliferation of unmanned aerial vehicles (UAVs) and their applications in diverse domains, such as surveillance, disaster management, agriculture, and defense, have revolutionized modern technology. While the potential benefits of swarm-based UAV networks are growing significantly, they are vulnerable to various security attacks that can jeopardize the overall mission success by degrading their performance, disrupting decision-making, and compromising the trajectory planning process. The Intrusion Detection System (IDS) plays a vital role in identifying potential security attacks to ensure the secure operation of UAV swarm networks. However, conventional IDS primarily focuses on binary classification with resource-intensive neural networks and faces challenges, including latency, privacy breaches, increased performance overhead, and model drift. This research aims to address these challenges by developing a novel lightweight and federated continuous learning-based IDS scheme. Our proposed model facilitates decentralized training across diverse UAV swarms to ensure data heterogeneity and privacy. The performance evaluation of our model demonstrates significant improvements, with classification accuracies of 99.45% on UKM-IDS, 99.99% on UAV-IDS, 96.85% on TLM-UAV dataset, and 98.05% on Cyber-Physical datasets.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.22791

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Checklist 1. For all authors (a)

Neural Information Processing SystemsAug-19-2025, 17:57:05 GMT

Do the main claims made in the abstract and introduction accurately reflect the paper's Did you describe the limitations of your work? If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [N/A] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) Therefore, an additional tuning process is needed to find an appropriate λ for different tasks. For better visualization, the scores are smoothed by a window with length 20. For better visualization, the scores are smoothed by a window with length 20.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning Xi Chen

Neural Information Processing SystemsAug-19-2025, 17:57:02 GMT

In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios, such as data from several human demonstrators or from

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets

Sheng, Zhichao, Zhou, Shilin, Gong, Chen, Li, Zhenghua

arXiv.org Artificial IntelligenceJul-18-2025

Spoken Language Understanding (SLU) plays a crucial role in speech-centric multimedia applications, enabling machines to comprehend spoken language in scenarios such as meetings, interviews, and customer service interactions. SLU encompasses multiple tasks, including Automatic Speech Recognition (ASR), spoken Named Entity Recognition (NER), and spoken Sentiment Analysis (SA). However, existing methods often rely on separate model architectures for individual tasks such as spoken NER and SA, which increases system complexity, limits cross-task interaction, and fails to fully exploit heterogeneous datasets available across tasks. To address these limitations, we propose UniSLU, a unified framework that jointly models multiple SLU tasks within a single architecture. Specifically, we propose a unified representation for diverse SLU tasks, enabling full utilization of heterogeneous datasets across multiple tasks. Built upon this representation, we propose a unified generative method that jointly models ASR, spoken NER, and SA tasks, enhancing task interactions and enabling seamless integration with large language models to harness their powerful generative capabilities. Extensive experiments on public SLU datasets demonstrate the effectiveness of our approach, achieving superior SLU performance compared to several benchmark methods, making it well-suited for real-world speech-based multimedia scenarios. We will release all code and models at github to facilitate future research.

artificial intelligence, natural language, unislu, (20 more...)

arXiv.org Artificial Intelligence

2507.12951

Country:

Europe (1.00)
North America > United States (0.68)
Asia > Middle East > UAE (0.46)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Personalized Dictionary Learning for Heterogeneous Datasets

Neural Information Processing SystemsJan-19-2025, 17:15:04 GMT

heterogeneous dataset, personalized dictionary learning, unique feature, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.90)

Add feedback

H-nobs: Achieving Certified Fairness and Robustness in Distributed Learning on Heterogeneous Datasets

Neural Information Processing SystemsJan-19-2025, 01:42:03 GMT

Fairness and robustness are two important goals in the design of modern distributed learning systems. Despite a few prior works attempting to achieve both fairness and robustness, some key aspects of this direction remain underexplored. In this paper, we try to answer three largely unnoticed and unaddressed questions that are of paramount significance to this topic: (i) What makes jointly satisfying fairness and robustness difficult? To address these questions, we first identify data heterogeneity as the key difficulty of combining fairness and robustness. Accordingly, we propose a fair and robust framework called H-nobs which can offer certified fairness and robustness through the adoption of two key components, a fairness-promoting objective function and a simple robust aggregation scheme called norm-based screening (NBS).

certified fairness and robustness, fairness and robustness, h-nob, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

SegHeD+: Segmentation of Heterogeneous Data for Multiple Sclerosis Lesions with Anatomical Constraints and Lesion-aware Augmentation

Basaran, Berke Doga, Matthews, Paul M., Bai, Wenjia

arXiv.org Artificial IntelligenceDec-14-2024

Assessing lesions and tracking their progression over time in brain magnetic resonance (MR) images is essential for diagnosing and monitoring multiple sclerosis (MS). Machine learning models have shown promise in automating the segmentation of MS lesions. However, training these models typically requires large, well-annotated datasets. Unfortunately, MS imaging datasets are often limited in size, spread across multiple hospital sites, and exhibit different formats (such as cross-sectional or longitudinal) and annotation styles. This data diversity presents a significant obstacle to developing a unified model for MS lesion segmentation. To address this issue, we introduce SegHeD+, a novel segmentation model that can handle multiple datasets and tasks, accommodating heterogeneous input data and performing segmentation for all lesions, new lesions, and vanishing lesions. We integrate domain knowledge about MS lesions by incorporating longitudinal, anatomical, and volumetric constraints into the segmentation model. Additionally, we perform lesion-level data augmentation to enlarge the training set and further improve segmentation performance. SegHeD+ is evaluated on five MS datasets and demonstrates superior performance in segmenting all, new, and vanishing lesions, surpassing several state-of-the-art methods in the field.

artificial intelligence, machine learning, segmentation, (20 more...)

arXiv.org Artificial Intelligence

2412.10946

Country: Europe > United Kingdom > England > Greater London > London (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Multiple Sclerosis (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback